Dataset statistics
| Number of variables | 26 |
|---|---|
| Number of observations | 100000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 19.8 MiB |
| Average record size in memory | 208.0 B |
Variable types
| CAT | 16 |
|---|---|
| NUM | 7 |
| BOOL | 2 |
| DATE | 1 |
crash_time has a high cardinality: 1440 distinct values | High cardinality |
location has a high cardinality: 44605 distinct values | High cardinality |
on_street_name has a high cardinality: 4328 distinct values | High cardinality |
combine_location has a high cardinality: 44606 distinct values | High cardinality |
nearest_street has a high cardinality: 27727 distinct values | High cardinality |
number_of_motorist_injured is highly correlated with number_of_persons_injured | High correlation |
number_of_persons_injured is highly correlated with number_of_motorist_injured | High correlation |
crash_year is highly correlated with collision_id | High correlation |
collision_id is highly correlated with crash_year | High correlation |
collision_id has unique values | Unique |
number_of_persons_injured has 72699 (72.7%) zeros | Zeros |
number_of_pedestrians_injured has 95454 (95.5%) zeros | Zeros |
number_of_motorist_injured has 81887 (81.9%) zeros | Zeros |
Reproduction
| Analysis started | 2020-12-13 10:03:11.879583 |
|---|---|
| Analysis finished | 2020-12-13 10:06:38.830317 |
| Duration | 3 minutes and 26.95 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
crash_date
Date
| Distinct | 551 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Minimum | 2013-03-23 00:00:00 |
|---|---|
| Maximum | 2020-09-29 00:00:00 |
| Distinct | 1440 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| 0:00 | 1637 |
|---|---|
| 17:00 | 1363 |
| 16:00 | 1360 |
| 14:00 | 1298 |
| 15:00 | 1246 |
| Other values (1435) |
| Value | Count | Frequency (%) | |
| 0:00 | 1637 | 1.6% | |
| 17:00 | 1363 | 1.4% | |
| 16:00 | 1360 | 1.4% | |
| 14:00 | 1298 | 1.3% | |
| 15:00 | 1246 | 1.2% | |
| 18:00 | 1231 | 1.2% | |
| 13:00 | 1153 | 1.2% | |
| 12:00 | 1103 | 1.1% | |
| 19:00 | 996 | 1.0% | |
| 10:00 | 971 | 1.0% | |
| Other values (1430) | 87642 | 87.6% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.74399 |
| Min length | 4 |
borough
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| BROOKLYN | |
|---|---|
| QUEENS | |
| BRONX | |
| MANHATTAN | |
| STATEN ISLAND | 3040 |
| Value | Count | Frequency (%) | |
| BROOKLYN | 32585 | 32.6% | |
| QUEENS | 27781 | 27.8% | |
| BRONX | 18899 | 18.9% | |
| MANHATTAN | 17695 | 17.7% | |
| STATEN ISLAND | 3040 | 3.0% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 13 |
|---|---|
| Median length | 8 |
| Mean length | 7.20636 |
| Min length | 5 |
zip_code
Real number (ℝ≥0)
| Distinct | 203 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10876.94022 |
|---|---|
| Minimum | 10000 |
| Maximum | 11697 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 10000 |
|---|---|
| 5-th percentile | 10013 |
| Q1 | 10455 |
| median | 11208 |
| Q3 | 11249 |
| 95-th percentile | 11429 |
| Maximum | 11697 |
| Range | 1697 |
| Interquartile range (IQR) | 794 |
Descriptive statistics
| Standard deviation | 533.7490418 |
|---|---|
| Coefficient of variation (CV) | 0.04907161674 |
| Kurtosis | -1.371981632 |
| Mean | 10876.94022 |
| Median Absolute Deviation (MAD) | 206 |
| Skewness | -0.5167109366 |
| Sum | 1087694022 |
| Variance | 284888.0396 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 11207 | 2220 | 2.2% | |
| 11212 | 1563 | 1.6% | |
| 11385 | 1539 | 1.5% | |
| 11208 | 1526 | 1.5% | |
| 11236 | 1460 | 1.5% | |
| 11434 | 1376 | 1.4% | |
| 11368 | 1337 | 1.3% | |
| 11203 | 1303 | 1.3% | |
| 11234 | 1296 | 1.3% | |
| 10457 | 1220 | 1.2% | |
| Other values (193) | 85160 | 85.2% |
| Value | Count | Frequency (%) | |
| 10000 | 20 | < 0.1% | |
| 10001 | 778 | 0.8% | |
| 10002 | 802 | 0.8% | |
| 10003 | 526 | 0.5% | |
| 10004 | 145 | 0.1% |
| Value | Count | Frequency (%) | |
| 11697 | 27 | < 0.1% | |
| 11695 | 1 | < 0.1% | |
| 11694 | 174 | 0.2% | |
| 11693 | 203 | 0.2% | |
| 11692 | 182 | 0.2% |
| Distinct | 44605 |
|---|---|
| Distinct (%) | 44.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| unspecified | 8204 |
|---|---|
| (40.861862, -73.91282) | 79 |
| (40.8047, -73.91243) | 55 |
| (40.820305, -73.89083) | 52 |
| (40.696033, -73.98453) | 48 |
| Other values (44600) |
| Value | Count | Frequency (%) | |
| unspecified | 8204 | 8.2% | |
| (40.861862, -73.91282) | 79 | 0.1% | |
| (40.8047, -73.91243) | 55 | 0.1% | |
| (40.820305, -73.89083) | 52 | 0.1% | |
| (40.696033, -73.98453) | 48 | < 0.1% | |
| (40.675735, -73.89686) | 48 | < 0.1% | |
| (40.658577, -73.89063) | 47 | < 0.1% | |
| (40.737785, -73.93496) | 43 | < 0.1% | |
| (40.733536, -73.87035) | 41 | < 0.1% | |
| (40.66496, -73.82226) | 40 | < 0.1% | |
| Other values (44595) | 91343 | 91.3% |
Unique
| Unique | 29003 ? |
|---|---|
| Unique (%) | 29.0% |
Length
| Max length | 25 |
|---|---|
| Median length | 22 |
| Mean length | 20.85131 |
| Min length | 11 |
| Distinct | 4328 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| unspecified | |
|---|---|
| BELT PARKWAY | 1616 |
| LONG ISLAND EXPRESSWAY | 1053 |
| BROOKLYN QUEENS EXPRESSWAY | 956 |
| BROADWAY | 863 |
| Other values (4323) |
| Value | Count | Frequency (%) | |
| unspecified | 26009 | 26.0% | |
| BELT PARKWAY | 1616 | 1.6% | |
| LONG ISLAND EXPRESSWAY | 1053 | 1.1% | |
| BROOKLYN QUEENS EXPRESSWAY | 956 | 1.0% | |
| BROADWAY | 863 | 0.9% | |
| FDR DRIVE | 852 | 0.9% | |
| GRAND CENTRAL PKWY | 820 | 0.8% | |
| ATLANTIC AVENUE | 717 | 0.7% | |
| MAJOR DEEGAN EXPRESSWAY | 674 | 0.7% | |
| CROSS BRONX EXPY | 652 | 0.7% | |
| Other values (4318) | 65788 | 65.8% |
Unique
| Unique | 1369 ? |
|---|---|
| Unique (%) | 1.4% |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 26.53811 |
| Min length | 11 |
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.37196 |
|---|---|
| Minimum | 0 |
| Maximum | 15 |
| Zeros | 72699 |
| Zeros (%) | 72.7% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.7439161865 |
|---|---|
| Coefficient of variation (CV) | 1.999989748 |
| Kurtosis | 16.5808926 |
| Mean | 0.37196 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.147118256 |
| Sum | 37196 |
| Variance | 0.5534112925 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 72699 | 72.7% | |
| 1 | 21011 | 21.0% | |
| 2 | 4125 | 4.1% | |
| 3 | 1308 | 1.3% | |
| 4 | 523 | 0.5% | |
| 5 | 196 | 0.2% | |
| 6 | 77 | 0.1% | |
| 7 | 36 | < 0.1% | |
| 8 | 14 | < 0.1% | |
| 9 | 5 | < 0.1% | |
| Other values (3) | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 72699 | 72.7% | |
| 1 | 21011 | 21.0% | |
| 2 | 4125 | 4.1% | |
| 3 | 1308 | 1.3% | |
| 4 | 523 | 0.5% |
| Value | Count | Frequency (%) | |
| 15 | 1 | < 0.1% | |
| 11 | 3 | < 0.1% | |
| 10 | 2 | < 0.1% | |
| 9 | 5 | < 0.1% | |
| 8 | 14 | < 0.1% |
number_of_persons_killed
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| 0 | |
|---|---|
| 1 | 176 |
| 2 | 7 |
| 3 | 1 |
| Value | Count | Frequency (%) | |
| 0 | 99816 | 99.8% | |
| 1 | 176 | 0.2% | |
| 2 | 7 | < 0.1% | |
| 3 | 1 | < 0.1% |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.04739 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 95454 |
| Zeros (%) | 95.5% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.2234383296 |
|---|---|
| Coefficient of variation (CV) | 4.714883512 |
| Kurtosis | 38.41899762 |
| Mean | 0.04739 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.270026474 |
| Sum | 4739 |
| Variance | 0.04992468715 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 95454 | 95.5% | |
| 1 | 4383 | 4.4% | |
| 2 | 142 | 0.1% | |
| 3 | 17 | < 0.1% | |
| 6 | 2 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 95454 | 95.5% | |
| 1 | 4383 | 4.4% | |
| 2 | 142 | 0.1% | |
| 3 | 17 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6 | 2 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 3 | 17 | < 0.1% | |
| 2 | 142 | 0.1% |
number_of_pedestrians_killed
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| 0 | |
|---|---|
| 1 | 64 |
| Value | Count | Frequency (%) | |
| 0 | 99936 | 99.9% | |
| 1 | 64 | 0.1% |
number_of_cyclist_injured
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| 0 | |
|---|---|
| 1 | 4744 |
| 2 | 107 |
| 3 | 2 |
| Value | Count | Frequency (%) | |
| 0 | 95147 | 95.1% | |
| 1 | 4744 | 4.7% | |
| 2 | 107 | 0.1% | |
| 3 | 2 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
number_of_cyclist_killed
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| 0 | |
|---|---|
| 1 | 25 |
| Value | Count | Frequency (%) | |
| 0 | 99975 | > 99.9% | |
| 1 | 25 | < 0.1% |
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.27492 |
|---|---|
| Minimum | 0 |
| Maximum | 15 |
| Zeros | 81887 |
| Zeros (%) | 81.9% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.711058401 |
|---|---|
| Coefficient of variation (CV) | 2.586419326 |
| Kurtosis | 21.76181987 |
| Mean | 0.27492 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.819224309 |
| Sum | 27492 |
| Variance | 0.5056040496 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 81887 | 81.9% | |
| 1 | 12243 | 12.2% | |
| 2 | 3767 | 3.8% | |
| 3 | 1259 | 1.3% | |
| 4 | 523 | 0.5% | |
| 5 | 189 | 0.2% | |
| 6 | 73 | 0.1% | |
| 7 | 34 | < 0.1% | |
| 8 | 14 | < 0.1% | |
| 9 | 5 | < 0.1% | |
| Other values (3) | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 81887 | 81.9% | |
| 1 | 12243 | 12.2% | |
| 2 | 3767 | 3.8% | |
| 3 | 1259 | 1.3% | |
| 4 | 523 | 0.5% |
| Value | Count | Frequency (%) | |
| 15 | 1 | < 0.1% | |
| 11 | 3 | < 0.1% | |
| 10 | 2 | < 0.1% | |
| 9 | 5 | < 0.1% | |
| 8 | 14 | < 0.1% |
number_of_motorist_killed
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| 0 | |
|---|---|
| 1 | 89 |
| 2 | 6 |
| 3 | 1 |
| Value | Count | Frequency (%) | |
| 0 | 99904 | 99.9% | |
| 1 | 89 | 0.1% | |
| 2 | 6 | < 0.1% | |
| 3 | 1 | < 0.1% |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
contributing_factor_vehicle_1
Categorical
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Driver Inattention/Distraction | |
|---|---|
| Unspecified | |
| Following Too Closely | |
| Other_factor | |
| Failure to Yield Right-of-Way | |
| Other values (11) |
| Value | Count | Frequency (%) | |
| Driver Inattention/Distraction | 25605 | 25.6% | |
| Unspecified | 25253 | 25.3% | |
| Following Too Closely | 7530 | 7.5% | |
| Other_factor | 6994 | 7.0% | |
| Failure to Yield Right-of-Way | 6023 | 6.0% | |
| Backing Unsafely | 4033 | 4.0% | |
| Passing or Lane Usage Improper | 3979 | 4.0% | |
| Passing Too Closely | 3676 | 3.7% | |
| Other Vehicular | 3071 | 3.1% | |
| Unsafe Lane Changing | 2588 | 2.6% | |
| Other values (6) | 11248 | 11.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 30 |
|---|---|
| Median length | 19 |
| Mean length | 20.43398 |
| Min length | 11 |
contributing_factor_vehicle_2
Categorical
| Distinct | 17 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Unspecified | |
|---|---|
| Other_factor | |
| Driver Inattention/Distraction | 5284 |
| Following Too Closely | 1296 |
| Other Vehicular | 1249 |
| Other values (12) | 4571 |
| Value | Count | Frequency (%) | |
| Unspecified | 67739 | 67.7% | |
| Other_factor | 19861 | 19.9% | |
| Driver Inattention/Distraction | 5284 | 5.3% | |
| Following Too Closely | 1296 | 1.3% | |
| Other Vehicular | 1249 | 1.2% | |
| Passing or Lane Usage Improper | 802 | 0.8% | |
| Failure to Yield Right-of-Way | 716 | 0.7% | |
| Passing Too Closely | 538 | 0.5% | |
| Unsafe Lane Changing | 402 | 0.4% | |
| Unsafe Speed | 383 | 0.4% | |
| Other values (7) | 1730 | 1.7% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 53 |
|---|---|
| Median length | 11 |
| Mean length | 13.00979 |
| Min length | 11 |
contributing_factor_vehicle_3
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Other_factor | |
|---|---|
| Unspecified | 8197 |
| Following Too Closely | 176 |
| Other Vehicular | 171 |
| Driver Inattention/Distraction | 118 |
| Value | Count | Frequency (%) | |
| Other_factor | 91338 | 91.3% | |
| Unspecified | 8197 | 8.2% | |
| Following Too Closely | 176 | 0.2% | |
| Other Vehicular | 171 | 0.2% | |
| Driver Inattention/Distraction | 118 | 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 30 |
|---|---|
| Median length | 12 |
| Mean length | 11.96024 |
| Min length | 11 |
| Distinct | 100000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4226109.341 |
|---|---|
| Minimum | 2568 |
| Maximum | 4353706 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 2568 |
|---|---|
| 5-th percentile | 3665427.95 |
| Q1 | 4182342.75 |
| median | 4300224 |
| Q3 | 4328315.25 |
| 95-th percentile | 4348345.05 |
| Maximum | 4353706 |
| Range | 4351138 |
| Interquartile range (IQR) | 145972.5 |
Descriptive statistics
| Standard deviation | 165356.0511 |
|---|---|
| Coefficient of variation (CV) | 0.03912725341 |
| Kurtosis | 45.22161792 |
| Mean | 4226109.341 |
| Median Absolute Deviation (MAD) | 51882.5 |
| Skewness | -3.965406795 |
| Sum | 4.226109341e+11 |
| Variance | 2.734262364e+10 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 4327423 | 1 | < 0.1% | |
| 4308023 | 1 | < 0.1% | |
| 4185140 | 1 | < 0.1% | |
| 4344412 | 1 | < 0.1% | |
| 4342365 | 1 | < 0.1% | |
| 4348510 | 1 | < 0.1% | |
| 4346463 | 1 | < 0.1% | |
| 4172384 | 1 | < 0.1% | |
| 4301409 | 1 | < 0.1% | |
| 4307554 | 1 | < 0.1% | |
| Other values (99990) | 99990 | > 99.9% |
| Value | Count | Frequency (%) | |
| 2568 | 1 | < 0.1% | |
| 69010 | 1 | < 0.1% | |
| 74294 | 1 | < 0.1% | |
| 127733 | 1 | < 0.1% | |
| 210591 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4353706 | 1 | < 0.1% | |
| 4353705 | 1 | < 0.1% | |
| 4353701 | 1 | < 0.1% | |
| 4353672 | 1 | < 0.1% | |
| 4353663 | 1 | < 0.1% |
vehicle_type_code_1
Categorical
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Sedan | |
|---|---|
| Station Wagon/Sport Utility Vehicle | |
| Other_code | |
| Taxi | 3478 |
| Pick-up Truck | 2615 |
| Other values (3) | 4134 |
| Value | Count | Frequency (%) | |
| Sedan | 46790 | 46.8% | |
| Station Wagon/Sport Utility Vehicle | 35766 | 35.8% | |
| Other_code | 7217 | 7.2% | |
| Taxi | 3478 | 3.5% | |
| Pick-up Truck | 2615 | 2.6% | |
| Box Truck | 1946 | 1.9% | |
| Bike | 1437 | 1.4% | |
| Tractor Truck Diesel | 751 | 0.8% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 35 |
|---|---|
| Median length | 5 |
| Mean length | 16.44119 |
| Min length | 4 |
vehicle_type_code_2
Categorical
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Sedan | |
|---|---|
| Other_code | |
| Station Wagon/Sport Utility Vehicle | |
| Bike | |
| Taxi | 2300 |
| Other values (5) |
| Value | Count | Frequency (%) | |
| Sedan | 31369 | 31.4% | |
| Other_code | 31039 | 31.0% | |
| Station Wagon/Sport Utility Vehicle | 24773 | 24.8% | |
| Bike | 3586 | 3.6% | |
| Taxi | 2300 | 2.3% | |
| Pick-up Truck | 2282 | 2.3% | |
| Box Truck | 2146 | 2.1% | |
| Bus | 1011 | 1.0% | |
| Tractor Truck Diesel | 763 | 0.8% | |
| Motorcycle | 731 | 0.7% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 35 |
|---|---|
| Median length | 10 |
| Mean length | 14.32417 |
| Min length | 3 |
vehicle_type_code_3
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Other_code | |
|---|---|
| Sedan | 4129 |
| Station Wagon/Sport Utility Vehicle | 3380 |
| Pick-up Truck | 195 |
| Taxi | 187 |
| Value | Count | Frequency (%) | |
| Other_code | 92036 | 92.0% | |
| Sedan | 4129 | 4.1% | |
| Station Wagon/Sport Utility Vehicle | 3380 | 3.4% | |
| Pick-up Truck | 195 | 0.2% | |
| Taxi | 187 | 0.2% | |
| Box Truck | 73 | 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 35 |
|---|---|
| Median length | 10 |
| Mean length | 10.63245 |
| Min length | 4 |
crash_day
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| Friday | |
|---|---|
| Tuesday | |
| Thursday | |
| Wednesday | |
| Monday | |
| Other values (2) |
| Value | Count | Frequency (%) | |
| Friday | 15494 | 15.5% | |
| Tuesday | 14653 | 14.7% | |
| Thursday | 14573 | 14.6% | |
| Wednesday | 14285 | 14.3% | |
| Monday | 14242 | 14.2% | |
| Saturday | 13964 | 14.0% | |
| Sunday | 12789 | 12.8% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 9 |
|---|---|
| Median length | 7 |
| Mean length | 7.14582 |
| Min length | 6 |
crash_month
Real number (ℝ≥0)
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.02299 |
|---|---|
| Minimum | 2 |
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 6 |
| median | 7 |
| Q3 | 8 |
| 95-th percentile | 9 |
| Maximum | 12 |
| Range | 10 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.828325706 |
|---|---|
| Coefficient of variation (CV) | 0.2603343741 |
| Kurtosis | 0.02879865261 |
| Mean | 7.02299 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.2523065626 |
| Sum | 702299 |
| Variance | 3.342774888 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 8 | 26950 | 27.0% | |
| 7 | 23589 | 23.6% | |
| 9 | 13226 | 13.2% | |
| 6 | 11646 | 11.6% | |
| 5 | 8426 | 8.4% | |
| 4 | 7679 | 7.7% | |
| 3 | 4133 | 4.1% | |
| 11 | 3603 | 3.6% | |
| 12 | 506 | 0.5% | |
| 10 | 154 | 0.2% |
| Value | Count | Frequency (%) | |
| 2 | 88 | 0.1% | |
| 3 | 4133 | 4.1% | |
| 4 | 7679 | 7.7% | |
| 5 | 8426 | 8.4% | |
| 6 | 11646 | 11.6% |
| Value | Count | Frequency (%) | |
| 12 | 506 | 0.5% | |
| 11 | 3603 | 3.6% | |
| 10 | 154 | 0.2% | |
| 9 | 13226 | 13.2% | |
| 8 | 26950 | 27.0% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2019.37856 |
|---|---|
| Minimum | 2013 |
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 781.2 KiB |
Quantile statistics
| Minimum | 2013 |
|---|---|
| 5-th percentile | 2017 |
| Q1 | 2019 |
| median | 2020 |
| Q3 | 2020 |
| 95-th percentile | 2020 |
| Maximum | 2020 |
| Range | 7 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.7793320217 |
|---|---|
| Coefficient of variation (CV) | 0.0003859266594 |
| Kurtosis | 3.639582223 |
| Mean | 2019.37856 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -1.657671392 |
| Sum | 201937856 |
| Variance | 0.6073584 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2020 | 50037 | 50.0% | |
| 2019 | 43881 | 43.9% | |
| 2017 | 5871 | 5.9% | |
| 2018 | 148 | 0.1% | |
| 2015 | 43 | < 0.1% | |
| 2013 | 19 | < 0.1% | |
| 2014 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2013 | 19 | < 0.1% | |
| 2014 | 1 | < 0.1% | |
| 2015 | 43 | < 0.1% | |
| 2017 | 5871 | 5.9% | |
| 2018 | 148 | 0.1% |
| Value | Count | Frequency (%) | |
| 2020 | 50037 | 50.0% | |
| 2019 | 43881 | 43.9% | |
| 2018 | 148 | 0.1% | |
| 2017 | 5871 | 5.9% | |
| 2015 | 43 | < 0.1% |
| Distinct | 44606 |
|---|---|
| Distinct (%) | 44.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| (nan,nan) | 8035 |
|---|---|
| (0.0,0.0) | 169 |
| (40.861862,-73.91282) | 79 |
| (40.8047,-73.91243) | 55 |
| (40.820305,-73.89083000000001) | 52 |
| Other values (44601) |
| Value | Count | Frequency (%) | |
| (nan,nan) | 8035 | 8.0% | |
| (0.0,0.0) | 169 | 0.2% | |
| (40.861862,-73.91282) | 79 | 0.1% | |
| (40.8047,-73.91243) | 55 | 0.1% | |
| (40.820305,-73.89083000000001) | 52 | 0.1% | |
| (40.675734999999996,-73.89686) | 48 | < 0.1% | |
| (40.696033,-73.98453) | 48 | < 0.1% | |
| (40.658577,-73.89063) | 47 | < 0.1% | |
| (40.737784999999995,-73.93496) | 43 | < 0.1% | |
| (40.733536,-73.87035) | 41 | < 0.1% | |
| Other values (44596) | 91383 | 91.4% |
Unique
| Unique | 29003 ? |
|---|---|
| Unique (%) | 29.0% |
Length
| Max length | 39 |
|---|---|
| Median length | 21 |
| Mean length | 23.06706 |
| Min length | 9 |
| Distinct | 27727 |
|---|---|
| Distinct (%) | 27.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 781.2 KiB |
| unspecified | |
|---|---|
| 3 AVENUE | 432 |
| BROADWAY | 424 |
| 2 AVENUE | 340 |
| LINDEN BOULEVARD | 280 |
| Other values (27722) |
| Value | Count | Frequency (%) | |
| unspecified | 26908 | 26.9% | |
| 3 AVENUE | 432 | 0.4% | |
| BROADWAY | 424 | 0.4% | |
| 2 AVENUE | 340 | 0.3% | |
| LINDEN BOULEVARD | 280 | 0.3% | |
| 5 AVENUE | 247 | 0.2% | |
| ATLANTIC AVENUE | 240 | 0.2% | |
| 1 AVENUE | 237 | 0.2% | |
| 7 AVENUE | 229 | 0.2% | |
| PARK AVENUE | 222 | 0.2% | |
| Other values (27717) | 70441 | 70.4% |
Unique
| Unique | 22498 ? |
|---|---|
| Unique (%) | 22.5% |
Length
| Max length | 40 |
|---|---|
| Median length | 13 |
| Mean length | 19.56421 |
| Min length | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| crash_date | crash_time | borough | zip_code | location | on_street_name | number_of_persons_injured | number_of_persons_killed | number_of_pedestrians_injured | number_of_pedestrians_killed | number_of_cyclist_injured | number_of_cyclist_killed | number_of_motorist_injured | number_of_motorist_killed | contributing_factor_vehicle_1 | contributing_factor_vehicle_2 | contributing_factor_vehicle_3 | collision_id | vehicle_type_code_1 | vehicle_type_code_2 | vehicle_type_code_3 | crash_day | crash_month | crash_year | combine_location | nearest_street | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-08-03 | 17:25 | STATEN ISLAND | 10307 | (40.501465, -74.24523) | SWINNERTON STREET | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | 4182249 | Station Wagon/Sport Utility Vehicle | Station Wagon/Sport Utility Vehicle | Station Wagon/Sport Utility Vehicle | Saturday | 8 | 2019 | (40.501465,-74.24523) | CLERMONT AVENUE |
| 1 | 2019-09-07 | 0:36 | STATEN ISLAND | 10307 | (40.50331, -74.237465) | SPRAGUE AVENUE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unsafe Speed | Unspecified | Unspecified | 4201115 | Station Wagon/Sport Utility Vehicle | Sedan | Station Wagon/Sport Utility Vehicle | Saturday | 9 | 2019 | (40.50331,-74.237465) | unspecified |
| 2 | 2019-08-17 | 15:00 | STATEN ISLAND | 10307 | (40.503387, -74.24883) | FINLAY STREET | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Other_factor | Other_factor | 4198160 | Sedan | Other_code | Other_code | Saturday | 8 | 2019 | (40.503387,-74.24883) | unspecified |
| 3 | 2017-05-06 | 2:55 | STATEN ISLAND | 10307 | (40.503414, -74.24496) | unspecified | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Alcohol Involvement | Unspecified | Unspecified | 3664377 | Station Wagon/Sport Utility Vehicle | Sedan | Sedan | Saturday | 5 | 2017 | (40.503414,-74.24495999999999) | 463 MAIN STREET |
| 4 | 2020-07-08 | 20:20 | STATEN ISLAND | 10307 | (40.50447, -74.243454) | HYLAN BOULEVARD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Failure to Yield Right-of-Way | Unspecified | Other_factor | 4327159 | Sedan | Sedan | Other_code | Wednesday | 7 | 2020 | (40.50447,-74.243454) | unspecified |
| 5 | 2020-07-06 | 17:00 | STATEN ISLAND | 10307 | (40.504482, -74.24727) | unspecified | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Unspecified | Other_factor | Other_factor | 4326628 | Station Wagon/Sport Utility Vehicle | Other_code | Other_code | Monday | 7 | 2020 | (40.504482,-74.24727) | 171 CARTERET STREET |
| 6 | 2019-07-09 | 12:50 | STATEN ISLAND | 10307 | (40.505527, -74.23819) | HYLAN BOULEVARD | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | Following Too Closely | Unspecified | Other_factor | 4167167 | Station Wagon/Sport Utility Vehicle | Station Wagon/Sport Utility Vehicle | Other_code | Tuesday | 7 | 2019 | (40.505527,-74.23819) | SPRAGUE AVENUE |
| 7 | 2019-07-20 | 16:30 | STATEN ISLAND | 10307 | (40.506187, -74.2349) | HYLAN BOULEVARD | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Failure to Yield Right-of-Way | Unspecified | Other_factor | 4173987 | Station Wagon/Sport Utility Vehicle | Station Wagon/Sport Utility Vehicle | Other_code | Saturday | 7 | 2019 | (40.506187,-74.2349) | JOLINE AVENUE |
| 8 | 2020-03-15 | 4:10 | STATEN ISLAND | 10307 | (40.506187, -74.2349) | JOLINE AVENUE | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Other_factor | Other_factor | 4307254 | Sedan | Other_code | Other_code | Sunday | 3 | 2020 | (40.506187,-74.2349) | HYLAN BOULEVARD |
| 9 | 2020-07-18 | 7:26 | STATEN ISLAND | 10307 | (40.506187, -74.2349) | HYLAN BOULEVARD | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Driver Inattention/Distraction | Other_factor | Other_factor | 4331848 | Sedan | Other_code | Other_code | Saturday | 7 | 2020 | (40.506187,-74.2349) | JOLINE AVENUE |
Last rows
| crash_date | crash_time | borough | zip_code | location | on_street_name | number_of_persons_injured | number_of_persons_killed | number_of_pedestrians_injured | number_of_pedestrians_killed | number_of_cyclist_injured | number_of_cyclist_killed | number_of_motorist_injured | number_of_motorist_killed | contributing_factor_vehicle_1 | contributing_factor_vehicle_2 | contributing_factor_vehicle_3 | collision_id | vehicle_type_code_1 | vehicle_type_code_2 | vehicle_type_code_3 | crash_day | crash_month | crash_year | combine_location | nearest_street | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 99990 | 2019-11-11 | 14:30 | QUEENS | 11354 | unspecified | 30 AVENUE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Turning Improperly | Unspecified | Other_factor | 4239398 | Sedan | Other_code | Other_code | Monday | 11 | 2019 | (0.0,0.0) | COLLEGE POINT BOULEVARD |
| 99991 | 2019-11-10 | 4:37 | QUEENS | 11103 | unspecified | unspecified | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Alcohol Involvement | Unspecified | Other_factor | 4238627 | Sedan | Station Wagon/Sport Utility Vehicle | Other_code | Sunday | 11 | 2019 | (0.0,0.0) | 28-42 37 STREET |
| 99992 | 2019-11-12 | 0:50 | BROOKLYN | 11211 | unspecified | BROADWAY | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Other_factor | Other_factor | 4240012 | Other_code | Other_code | Other_code | Tuesday | 11 | 2019 | (0.0,0.0) | MARCY AVENUE |
| 99993 | 2019-11-11 | 21:00 | MANHATTAN | 10002 | unspecified | unspecified | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | Other_factor | 4239386 | Station Wagon/Sport Utility Vehicle | Other_code | Other_code | Monday | 11 | 2019 | (0.0,0.0) | 128 PITT STREET |
| 99994 | 2019-11-10 | 4:15 | BROOKLYN | 11226 | unspecified | FLATBUSH AVENUE | 5 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | Other Vehicular | Traffic Control Disregarded | Other_factor | 4238152 | Sedan | Station Wagon/Sport Utility Vehicle | Other_code | Sunday | 11 | 2019 | (0.0,0.0) | SNYDER AVENUE |
| 99995 | 2019-11-11 | 23:11 | BROOKLYN | 11229 | unspecified | KINGS HIGHWAY | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Other_factor | 4239412 | Sedan | Sedan | Other_code | Monday | 11 | 2019 | (0.0,0.0) | OCEAN AVENUE |
| 99996 | 2019-11-10 | 3:46 | QUEENS | 11377 | unspecified | ROOSEVELT AVENUE | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Unspecified | Other_factor | Other_factor | 4239029 | Station Wagon/Sport Utility Vehicle | Other_code | Other_code | Sunday | 11 | 2019 | (0.0,0.0) | 58 STREET |
| 99997 | 2019-11-11 | 21:40 | BROOKLYN | 11221 | unspecified | unspecified | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | Other_factor | 4239762 | Sedan | Sedan | Other_code | Monday | 11 | 2019 | (0.0,0.0) | 829 GATES AVENUE |
| 99998 | 2019-11-11 | 23:30 | MANHATTAN | 10013 | unspecified | unspecified | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Passing Too Closely | Unspecified | Other_factor | 4239642 | Station Wagon/Sport Utility Vehicle | Station Wagon/Sport Utility Vehicle | Other_code | Monday | 11 | 2019 | (0.0,0.0) | 9 CROSBY STREET |
| 99999 | 2019-11-10 | 3:30 | QUEENS | 11419 | unspecified | unspecified | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | Other_factor | 4238994 | Sedan | Other_code | Other_code | Sunday | 11 | 2019 | (0.0,0.0) | 134-30 ATLANTIC AVENUE |